Using Multilingual Resources for Building SloWNet Faster
نویسنده
چکیده
This project report presents the results of an approach in which synsets for Slovene wordnet were induced automatically from parallel corpora and already existing wordnets. First, multilingual lexicons were obtained from word-aligned corpora and compared to the wordnets in various languages in order to disambiguate lexicon entries. Then appropriate synset ids were attached to Slovene entries from the lexicon. In the end, Slovene lexicon entries sharing the same synset id were organized into a synset. The results were evaluated against a goldstandard and checked by hand.
منابع مشابه
sloWNet: construction and corpus annotation
This paper presents a wordnet for Slovene which was created semi-automatically with a combination of approaches and multilingual resources, in particular a bilingual dictionary, a parallel corpus and Wikipedia. Analysis of the results shows that the dictionary approach yields a good core wordnet but requires substantial manual editing due to a lack of automatic word-sense disambiguation. This w...
متن کاملBuilding a Chinese-English Mapping between Verb Concepts for Multilingual Applications
This paper addresses the problem of building conceptual resources for multilingual applications. We describe new techniques for large-scale construction of a Chinese-English lexicon for verbs, using thematic-role information to create links between Chinese and English conceptual information. We then present an approach to compensating for gaps in the existing resources. The resulting lexicon is...
متن کاملVisualizing sloWNet
With the increasing popularity of semantic lexica such as wordnets that are being developed for more and more languages the need for tools which enable displaying and management of their content has risen as well. Dictionary writing systems or tools for ontology management are not suitable for use with wordnets because they are concept-based and relational on the one hand but less formal and mo...
متن کاملBuilding Specialized Multilingual Lexical Graphs Using Community Resources
We are describing methods for compiling domain-dedicated multilingual terminological data from various resources. We focus on collecting data from online community users as a main source, therefore, our approach depends on acquiring contributions from volunteers (explicit approach), and it depends on analyzing users’ behaviors to extract interesting patterns and facts (implicit approach). As a ...
متن کاملMultilingual Grammar Resources in Multilingual Application Development
Grammar development makes up a large part of the multilingual rule-based application development cycle. One way to decrease the required grammar development efforts is to base the systems on multilingual grammar resources. This paper presents a detailed description of a parametrization mechanism used for building multilingual grammar rules. We show how these rules, which had originally been des...
متن کامل